{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Copyright 2020 Google Inc. All Rights Reserved.\n",
    "#\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#            http://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deploying NVIDIA Triton Inference Server in AI Platform Prediction Custom Container (REST API)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook, we will walk through the process of deploying NVIDIA's Triton Inference Server into AI Platform Prediction Custom Container service in the Direct Model Server mode:\n",
    "\n",
    "![](img/caip_triton_container_diagram_direct.jpg)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "PROJECT_ID='[Enter project name - REQUIRED]'\n",
    "REPOSITORY='caipcustom'\n",
    "REGION='us-central1'\n",
    "TRITON_VERSION='20.06'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import random\n",
    "import requests\n",
    "import json\n",
    "\n",
    "MODEL_BUCKET='gs://{}-{}'.format(PROJECT_ID,random.randint(10000,99999))\n",
    "ENDPOINT='https://{}-ml.googleapis.com/v1'.format(REGION)\n",
    "TRITON_IMAGE='tritonserver:{}-py3'.format(TRITON_VERSION)\n",
    "CAIP_IMAGE='{}-docker.pkg.dev/{}/{}/{}'.format(REGION,PROJECT_ID,REPOSITORY,TRITON_IMAGE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'''\n",
    "# Test values\n",
    "\n",
    "PROJECT_ID='tsaikevin-1238'\n",
    "REPOSITORY='caipcustom'\n",
    "REGION='us-central1'\n",
    "TRITON_VERSION='20.06'\n",
    "\n",
    "import os\n",
    "import random\n",
    "import requests\n",
    "import json\n",
    "\n",
    "MODEL_BUCKET='gs://{}-{}'.format(PROJECT_ID,random.randint(10000,99999))\n",
    "ENDPOINT='https://{}-ml.googleapis.com/v1'.format(REGION)\n",
    "TRITON_IMAGE='tritonserver:{}-py3'.format(TRITON_VERSION)\n",
    "CAIP_IMAGE='{}-docker.pkg.dev/{}/{}/{}'.format(REGION,PROJECT_ID,REPOSITORY,TRITON_IMAGE)\n",
    "'''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gcloud config set project $PROJECT_ID"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ[\"PROJECT_ID\"]=PROJECT_ID\n",
    "os.environ[\"MODEL_BUCKET\"]=MODEL_BUCKET\n",
    "os.environ[\"ENDPOINT\"]=ENDPOINT\n",
    "os.environ[\"CAIP_IMAGE\"]=CAIP_IMAGE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the Artifact Registry\n",
    "This will be used to store the container image for the model server Triton."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gcloud beta artifacts repositories create $REPOSITORY --repository-format=docker --location=$REGION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gcloud beta auth configure-docker $REGION-docker.pkg.dev --quiet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prepare the container\n",
    "We will make a copy of the Triton container image into the Artifact Registry, where AI Platform Custom Container Prediction will only pull from during Model Version setup. The following steps will download the NVIDIA Triton Inference Server container to your VM, then upload it to your repo."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!docker pull nvcr.io/nvidia/$TRITON_IMAGE && \\\n",
    " docker tag nvcr.io/nvidia/$TRITON_IMAGE $CAIP_IMAGE && \\\n",
    " docker push $CAIP_IMAGE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prepare model Artifacts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Clone the NVIDIA Triton Inference Server repo."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!git clone -b r$TRITON_VERSION https://github.com/triton-inference-server/server.git"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create the GCS bucket where the model artifacts will be copied to."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gsutil mb $MODEL_BUCKET"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Stage model artifacts and copy to bucket."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir model_repository"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cp -R server/docs/examples/model_repository/* model_repository/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!./server/docs/examples/fetch_models.sh"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gsutil -m cp -R model_repository/ $MODEL_BUCKET"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gsutil ls -Rl $MODEL_BUCKET/model_repository"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prepare request payload\n",
    "\n",
    "To prepare the payload format, we have included a utility get_request_body_simple.py.  To use this utility, install the following library:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip3 install geventhttpclient"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Prepare non-binary request payload\n",
    "\n",
    "The first model will illustrate a non-binary payload.  The following command will create a KF Serving v2 format non-binary payload to be used with the \"simple\" model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!python3 get_request_body_simple.py -m simple"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Prepare binary request payload\n",
    "\n",
    "Triton's implementation of KF Serving v2 protocol for binary data appends the binary data after the json body.  Triton requires an additional header for offset:\n",
    "\n",
    "`Inference-Header-Content-Length: [offset]`\n",
    "\n",
    "We have provided a script that will automatically resize the image to the proper size for ResNet-50 [224, 224, 3] and calculate the proper offset.  The following command takes an image file and outputs the necessary data structure to be use with the \"resnet50_netdef\" model.  Please note down this offset as it will be used later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!python3 get_request_body_simple.py -m image -f server/qa/images/mug.jpg"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create and deploy Model and Model Version\n",
    "\n",
    "In this section, we will deploy two models:\n",
    "1. Simple model with non-binary data.  KF Serving v2 protocol specifies a json format with non-binary data in the json body itself.\n",
    "2. Binary data model with ResNet-50.  Triton's implementation of binary data for KF Server v2 protocol."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Simple model (non-binary data)\n",
    "\n",
    "#### Create Model\n",
    "\n",
    "AI Platform Prediction uses a Model/Model Version Hierarchy, where the Model is a logical grouping of Model Versions.  We will first create the Model.\n",
    "\n",
    "Because the MODEL_NAME variable will be used later to specify the predict route, and Triton will use that route to run prediction on a specific model, we must set the value of this variable to a valid name of a model.  For this section, will use the \"simple\" model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%env MODEL_NAME=simple"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!curl -X \\\n",
    "    POST -v -k -H \"Content-Type: application/json\" \\\n",
    "    -d \"{'name': '\"$MODEL_NAME\"'}\" \\\n",
    "    -H \"Authorization: Bearer `gcloud auth print-access-token`\" \\\n",
    "    \"${ENDPOINT}/projects/${PROJECT_ID}/models/\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create Model Version\n",
    "\n",
    "After the Model is created, we can now create a Model Version under this Model.  Each Model Version will need a name that is unique within the Model.  In AI Platform Prediction Custom Container, a {Project}/{Model}/{ModelVersion} uniquely identifies the specific container and model artifact used for inference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%env VERSION_NAME=v01"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following specifications tell AI Platform how to create the Model Version."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "import os\n",
    "\n",
    "triton_simple_version = {\n",
    "  \"name\": os.getenv(\"VERSION_NAME\"),\n",
    "  \"deployment_uri\": os.getenv(\"MODEL_BUCKET\")+\"/model_repository\",\n",
    "  \"container\": {\n",
    "    \"image\": os.getenv(\"CAIP_IMAGE\"),\n",
    "    \"args\": [\"tritonserver\",\n",
    "             \"--model-repository=$(AIP_STORAGE_URI)\"\n",
    "    ],\n",
    "    \"env\": [\n",
    "    ], \n",
    "    \"ports\": [\n",
    "      { \"containerPort\": 8000 }\n",
    "    ]\n",
    "  },\n",
    "  \"routes\": {\n",
    "    \"predict\": \"/v2/models/\"+os.getenv(\"MODEL_NAME\")+\"/infer\",\n",
    "    \"health\": \"/v2/models/\"+os.getenv(\"MODEL_NAME\")\n",
    "  },\n",
    "  \"machine_type\": \"n1-standard-4\",\n",
    "  \"acceleratorConfig\": {\n",
    "    \"count\":1,\n",
    "    \"type\":\"nvidia-tesla-t4\"\n",
    "  },\n",
    "  \"autoScaling\": {\n",
    "    \"minNodes\": 1\n",
    "  }\n",
    "}\n",
    "\n",
    "with open(\"triton_simple_version.json\", \"w\") as f: \n",
    "  json.dump(triton_simple_version, f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!curl -X \\\n",
    "    POST -v -k -H \"Content-Type: application/json\" \\\n",
    "    -d @triton_simple_version.json \\\n",
    "    -H \"Authorization: Bearer `gcloud auth print-access-token`\" \\\n",
    "    \"${ENDPOINT}/projects/${PROJECT_ID}/models/${MODEL_NAME}/versions\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Check the status of Model Version creation\n",
    "\n",
    "Creating a Model Version may take several minutes.  You can check on the status of this specfic Model Version with the following, and a successful deployment will show:\n",
    "\n",
    "`\"state\": \"READY\"`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!curl -X GET -k -H \"Content-Type: application/json\" \\\n",
    "    -H \"Authorization: Bearer `gcloud auth print-access-token`\" \\\n",
    "    \"${ENDPOINT}/projects/${PROJECT_ID}/models/${MODEL_NAME}/versions/${VERSION_NAME}\" "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### To list all Model Versions and their states in this Model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!curl -X GET -k -H \"Content-Type: application/json\" \\\n",
    "    -H \"Authorization: Bearer `gcloud auth print-access-token`\" \\\n",
    "    \"${ENDPOINT}/projects/${PROJECT_ID}/models/${MODEL_NAME}/versions/\" "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Run prediction using `curl`\n",
    "\n",
    "The \"simple\" model takes two tensors with shape [1,16] and does a couple of basic arithmetic operation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!curl -X POST ${ENDPOINT}/projects/${PROJECT_ID}/models/${MODEL_NAME}/versions/${VERSION_NAME}:predict \\\n",
    "    -k -H \"Content-Type: application/json\" \\\n",
    "    -H \"Authorization: Bearer `gcloud auth print-access-token`\" \\\n",
    "    -d '{ \\\n",
    "            \"id\": \"0\", \\\n",
    "            \"inputs\": [ \\\n",
    "                { \\\n",
    "                    \"name\": \"INPUT0\", \\\n",
    "                    \"shape\": [1, 16], \\\n",
    "                    \"datatype\": \"INT32\", \\\n",
    "                    \"parameters\": {}, \\\n",
    "                    \"data\": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] \\\n",
    "                }, \\\n",
    "                { \\\n",
    "                    \"name\": \"INPUT1\", \\\n",
    "                    \"shape\": [1, 16], \\\n",
    "                    \"datatype\": \"INT32\", \\\n",
    "                    \"parameters\": {}, \\\n",
    "                    \"data\": [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1] \\\n",
    "                } \\\n",
    "            ] \\\n",
    "        }'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# this works: gcloud auth application-default print-access-token\n",
    "\n",
    "!curl -X POST ${ENDPOINT}/projects/${PROJECT_ID}/models/${MODEL_NAME}/versions/${VERSION_NAME}:predict \\\n",
    "    -k -H \"Content-Type: application/json\" \\\n",
    "    -H \"Authorization: Bearer `gcloud auth application-default print-access-token`\" \\\n",
    "    -d '{ \\\n",
    "            \"id\": \"0\", \\\n",
    "            \"inputs\": [ \\\n",
    "                { \\\n",
    "                    \"name\": \"INPUT0\", \\\n",
    "                    \"shape\": [1, 16], \\\n",
    "                    \"datatype\": \"INT32\", \\\n",
    "                    \"parameters\": {}, \\\n",
    "                    \"data\": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] \\\n",
    "                }, \\\n",
    "                { \\\n",
    "                    \"name\": \"INPUT1\", \\\n",
    "                    \"shape\": [1, 16], \\\n",
    "                    \"datatype\": \"INT32\", \\\n",
    "                    \"parameters\": {}, \\\n",
    "                    \"data\": [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1] \\\n",
    "                } \\\n",
    "            ] \\\n",
    "        }'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Run prediction using Using `requests` library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('simple.json', 'r') as s:\n",
    "    data=s.read()\n",
    "    \n",
    "PREDICT_URL = \"{}/projects/{}/models/{}/versions/{}:predict\".format(ENDPOINT, PROJECT_ID, os.getenv('MODEL_NAME'), os.getenv('VERSION_NAME'))\n",
    "HEADERS = {\n",
    "  'Content-Type': 'application/json',\n",
    "  'Authorization': 'Bearer {}'.format(os.popen('gcloud auth print-access-token').read().rstrip())\n",
    "}\n",
    "\n",
    "response = requests.request(\"POST\", PREDICT_URL, headers=HEADERS, data = data).content.decode()\n",
    "\n",
    "json.loads(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# this works: gcloud auth application-default print-access-token\n",
    "\n",
    "with open('simple.json', 'r') as s:\n",
    "    data=s.read()\n",
    "    \n",
    "PREDICT_URL = \"{}/projects/{}/models/{}/versions/{}:predict\".format(ENDPOINT, PROJECT_ID, os.getenv('MODEL_NAME'), os.getenv('VERSION_NAME'))\n",
    "HEADERS = {\n",
    "  'Content-Type': 'application/json',\n",
    "  'Authorization': 'Bearer {}'.format(os.popen('gcloud auth application-default print-access-token').read().rstrip())\n",
    "}\n",
    "\n",
    "response = requests.request(\"POST\", PREDICT_URL, headers=HEADERS, data = data).content.decode()\n",
    "\n",
    "json.loads(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ResNet-50 model (binary data)\n",
    "\n",
    "#### Create Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%env BINARY_MODEL_NAME=resnet50_netdef"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!curl -X POST -v -k -H \"Content-Type: application/json\" \\\n",
    "  -d \"{'name': '\"$BINARY_MODEL_NAME\"'}\" \\\n",
    "  -H \"Authorization: Bearer `gcloud auth print-access-token`\" \\\n",
    "  \"${ENDPOINT}/projects/${PROJECT_ID}/models/\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create Model Version"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%env BINARY_VERSION_NAME=v1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "triton_binary_version = {\n",
    "  \"name\": os.getenv(\"BINARY_VERSION_NAME\"),\n",
    "  \"deployment_uri\": os.getenv(\"MODEL_BUCKET\")+\"/model_repository\",\n",
    "  \"container\": {\n",
    "    \"image\": os.getenv(\"CAIP_IMAGE\"),\n",
    "    \"args\": [\"tritonserver\",\n",
    "             \"--model-repository=$(AIP_STORAGE_URI)\"\n",
    "    ],\n",
    "    \"env\": [\n",
    "    ], \n",
    "    \"ports\": [\n",
    "      { \"containerPort\": 8000 }\n",
    "    ]\n",
    "  },\n",
    "  \"routes\": {\n",
    "    \"predict\": \"/v2/models/\"+os.getenv(\"BINARY_MODEL_NAME\")+\"/infer\",\n",
    "    \"health\": \"/v2/models/\"+os.getenv(\"BINARY_MODEL_NAME\")\n",
    "  },\n",
    "  \"machine_type\": \"n1-standard-4\",\n",
    "  \"acceleratorConfig\": {\n",
    "    \"count\":1,\n",
    "    \"type\":\"nvidia-tesla-t4\"\n",
    "  },\n",
    "  \"autoScaling\": {\n",
    "    \"minNodes\": 1\n",
    "  }\n",
    "}\n",
    "\n",
    "with open(\"triton_binary_version.json\", \"w\") as f: \n",
    "  json.dump(triton_binary_version, f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!curl --request POST -v -k -H \"Content-Type: application/json\" \\\n",
    "  -d @triton_binary_version.json \\\n",
    "  -H \"Authorization: Bearer `gcloud auth print-access-token`\" \\\n",
    "  ${ENDPOINT}/projects/${PROJECT_ID}/models/${BINARY_MODEL_NAME}/versions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Check Model Version status"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!curl --request GET -k -H \"Content-Type: application/json\" \\\n",
    "    -H \"Authorization: Bearer `gcloud auth print-access-token`\" \\\n",
    "    \"${ENDPOINT}/projects/${PROJECT_ID}/models/${BINARY_MODEL_NAME}/versions/${BINARY_VERSION_NAME}\" "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Run prediction using `curl`\n",
    "\n",
    "Recall the offset value calcuated above.  The binary case has an additional header:\n",
    "\n",
    "`Inference-Header-Content-Length: [offset]`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!curl --request POST ${ENDPOINT}/projects/${PROJECT_ID}/models/${BINARY_MODEL_NAME}/versions/${BINARY_VERSION_NAME}:predict \\\n",
    "    -k -H \"Content-Type: application/octet-stream\" \\\n",
    "    -H \"Authorization: Bearer `gcloud auth print-access-token`\" \\\n",
    "    -H \"Inference-Header-Content-Length: 138\" \\\n",
    "    --data-binary @payload.dat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# this works: gcloud auth application-default print-access-token\n",
    "\n",
    "!curl --request POST ${ENDPOINT}/projects/${PROJECT_ID}/models/${BINARY_MODEL_NAME}/versions/${BINARY_VERSION_NAME}:predict \\\n",
    "    -k -H \"Content-Type: application/octet-stream\" \\\n",
    "    -H \"Authorization: Bearer `gcloud auth application-default print-access-token`\" \\\n",
    "    -H \"Inference-Header-Content-Length: 138\" \\\n",
    "    --data-binary @payload.dat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Run prediction using Using `requests` library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('payload.dat', 'rb') as s:\n",
    "    data=s.read()\n",
    "\n",
    "PREDICT_URL = \"{}/projects/{}/models/{}/versions/{}:predict\".format(ENDPOINT, PROJECT_ID, os.getenv('BINARY_MODEL_NAME'), os.getenv('BINARY_VERSION_NAME'))\n",
    "HEADERS = {\n",
    "  'Content-Type': 'application/octet-stream',\n",
    "  'Inference-Header-Content-Length': '138',\n",
    "  'Authorization': 'Bearer {}'.format(os.popen('gcloud auth print-access-token').read().rstrip())\n",
    "}\n",
    "\n",
    "response = requests.request(\"POST\", PREDICT_URL, headers=HEADERS, data = data).content.decode()\n",
    "\n",
    "json.loads(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# this works: gcloud auth application-default print-access-token\n",
    "\n",
    "with open('payload.dat', 'rb') as s:\n",
    "    data=s.read()\n",
    "\n",
    "PREDICT_URL = \"{}/projects/{}/models/{}/versions/{}:predict\".format(ENDPOINT, PROJECT_ID, os.getenv('BINARY_MODEL_NAME'), os.getenv('BINARY_VERSION_NAME'))\n",
    "HEADERS = {\n",
    "  'Content-Type': 'application/octet-stream',\n",
    "  'Inference-Header-Content-Length': '138',\n",
    "  'Authorization': 'Bearer {}'.format(os.popen('gcloud auth application-default print-access-token').read().rstrip())\n",
    "}\n",
    "\n",
    "response = requests.request(\"POST\", PREDICT_URL, headers=HEADERS, data = data).content.decode()\n",
    "\n",
    "json.loads(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clean up"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!curl --request DELETE -k -H \"Content-Type: application/json\" \\\n",
    "    -H \"Authorization: Bearer `gcloud auth print-access-token`\" \\\n",
    "    \"${ENDPOINT}/projects/${PROJECT_ID}/models/${BINARY_MODEL_NAME}/versions/${BINARY_VERSION_NAME}\" "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!curl --request DELETE -k -H \"Content-Type: application/json\" \\\n",
    "    -H \"Authorization: Bearer `gcloud auth print-access-token`\" \\\n",
    "    \"${ENDPOINT}/projects/${PROJECT_ID}/models/${BINARY_MODEL_NAME}\" "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!curl --request DELETE -k -H \"Content-Type: application/json\" \\\n",
    "    -H \"Authorization: Bearer `gcloud auth print-access-token`\" \\\n",
    "    \"${ENDPOINT}/projects/${PROJECT_ID}/models/${MODEL_NAME}/versions/${VERSION_NAME}\" "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!curl --request DELETE -k -H \"Content-Type: application/json\" \\\n",
    "    -H \"Authorization: Bearer `gcloud auth print-access-token`\" \\\n",
    "    \"${ENDPOINT}/projects/${PROJECT_ID}/models/${MODEL_NAME}\" "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gsutil -m rm -r -f $MODEL_BUCKET"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm -rf model_repository triton-inference-server server *.yaml *.dat *.json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "environment": {
   "name": "tf-cpu.1-15.m56",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/tf-cpu.1-15:m56"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
